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Abstract 

The distribution of seismic moment is of capital interest to evaluate earthquake hazard, in 
particular regarding the most extreme events. We make use of likelihood-ratio tests to compare 
the simple Gutenberg-Richter power-law distribution with two statistical models that incorporate 
an exponential tail: the so-called tapered Gutenberg-Richter and the truncated gamma, when 
fitted to the global CMT earthquake catalog. The outcome is that the truncated gamma model 
outperforms the other two models. If simulated samples of the truncated gamma are reshuffled in 
order to mimic the time occurrence of the order statistics of the empirical data, this model turns 
out to be able to explain the empirical data both before and after the great Sumatra-Andaman 
earthquake of 2004. 

PACS numbers: 
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The Gutenberg-Richter (GR) law is not only of fundamental importance in statistical 
seismology [Tj but also a cornerstone of non-linear geophysics |2j and complex-systems sci¬ 
ence PJ. It simply states that, for a given region, the magnitudes of earthquakes follow 
an exponential probability distribution. As the (scalar) seismic moment is an exponential 
function of magnitude, when the GR law is expressed in terms of the former variable, it 
translates into a power-law distribution BE], he., 


f(M) oc 


1 

M^’ 
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with M seismic moment, f(M ) its probability density, (fulfilling J VM = 1), the 

sign “cx” denoting proportionality, and the exponent 1 + (3 taking values close to 1.65. 
This simple description provides rather good fits of available data in many cases PHH], 
with, remarkably, only one free parameter, /3. A totally equivalent characterization of the 
distribution uses the survivor function (or complementary cumulative distribution), defined 
as S(M ) = f(M')dM ', for which the GR power law takes the form S(M ) oc l/AT 3 . 

The power-law distribution has important physical implications, as it suggests an origin 
from a critical branching process or a self-organized-critical system puna nu. Nevertheless, 
it presents also some conceptual difficulties, due to the fact that the mean value (M) provided 
by the distribution turns out to be infinite [3f. These elementary considerations imply that 
the GR law cannot be naively extended to arbitrarily large values of M, and one needs to 
introduce additional parameters to describe the tail of the distribution, coming presumably 
from finite-size effects. However, a big problem is that the change from power law to a faster 
decay seems to take place at the highest values of M that have been observed, for which the 
statistics are very poor [T2] . 

Kagan [7] has enumerated the requirements that an extension of the GR law should 
fulfill; in particular, he considered, among other: (i) the so called tapered (Tap) Gutenberg- 
Richter distribution (also called Kagan distribution [13]), with a survivor function given by 
S t a P (M) oc e- M / 0 /M? and (ii) the (left-) truncated gamma (TrG) distribution, for which 
the density is f trg (M) oc e~ M ^ e /M l+ P. Note that both expressions have essentially the 
same functional form, but the former refers to the survivor function and the later to the 
density. As, in general, /(M) = — dS(M)/dM , differentiation of S tap (M ) in (i) shows the 
difference between both distributions. In any case, parameter 9 represents a crossover value 
of seismic moment, signaling a transition from power law to exponential decay; so, 6 gives the 
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scale of the finite-size effects on the seismic moment. The corresponding value of (moment) 
magnitude (sometimes called corner magnitude) can be obtained from m c = | (log 10 0 — 9.1), 
when the seismic moment is measured in N-m [141 [13] . 

Kagan [7j also argues that available seismic catalogs do not allow the reliable estimation of 
9, except in the global case (or for large subsets of this case), in particular, he recommends 
the use of the centroid moment tensor (CMT) catalog [16] - From his analysis of global 
seismicity, and comparing the values of the likelihoods, Kagan [7] concludes that the tapered 
GR distribution gives a slightly better fit than the truncated gamma distribution, for which 
in addition the estimation procedure is more involving. In any case, the /3 —value seems to 
be universal (at variance with 9), see also Refs. [9[ [TT, IE]. 

Nevertheless, the data analyzed by Kagan [7], from 1977 to 1999, comprises a period 
of relatively low global seismic activity, with no event above magnitude 8.5; in contrast, 
the period 1950 - 1965 witnessed 7 of such events [19]. Starting with the great Sumatra- 
Andaman earthquake of 2004, and following since then with 4 more earthquakes with m > 
8.5, the current period seems to correspond to the past higher levels of activity (up to the 
time of submitting this letter). Main et al. and Bell et al. [20, 21] have re-examined the 
problem of the seismic moment distribution including recent global data (shallow events 
only). Using a Bayesian information criterion (BIG), Bell et al. [21] compare the plain GR 
power law with the tapered GR distribution, and conclude that, although the tapered GR 
gives a significantly better fit before the 2004 Sumatra event, the occurrence of this changes 
the balance of the BIG statistics, making the GR power law more suitable; that is, the power 
law is more parsimonious, or simply, is enough for describing global shallow seismicity when 
the recent mega-earthquakes are included in the data. Similar results have been published 
in Ref. [22]. 

In this paper we revisit the problem with more recent data, using other statistical tools, 
reaching somewhat different conclusions. As in the mentioned papers, we analyze the global 
CMT catalog [16], in our case for the period between 1 January 1977 and 31 October 
2013, with the values of the seismic moment converted into N-m (1 dyn-cm = 10 - ' N-m). 
We restrict to shallow events (depth < 70 km) and in order to avoid incompleteness, to 
magnitude m > 5.75 (equivalent to M > 5.3 • 10 1 ' N-m), as in Refs. [20, |2Tj. This yields 
6150 events. 

As statistical tools, we use maximum likelihood estimation (MLE) for fitting, and like- 
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lihood ratio (LR) tests for comparison of different fits. Model selection tests based on the 
likelihood ratio have the advantage that the ratio is invariant with respect to changes of 
variables (if these are one-to-one [23]). Moreover, for comparing the fit of models in pairs, 
LR test is preferable in front of the computation of differences in BIC or AIC (Akaike in¬ 
formation criterion), as the test relies on the fact that the distribution of the LR is known, 
under a suitable null hypothesis (although the log-likelihood-ratio is equal to the difference 
of BIC or AIC when the number of parameters of the two models is the same). 

Maximum likelihood estimation is the best-accepted method in order to fit probability 
distributions, as it yields estimators which are invariant under re-parameterizations, and 
which are asymptotically unbiased and efficient for regular models, in particular for expo¬ 
nential families ra- When maximum likelihood is used under a wrong model, what one 
Ends is the closest model to the true distribution in terms of the Kullback-Leibler distance 
[23J. In order to perform MLE it is necessary to specify the densities of the distributions, 
including the normalization factors. In our case, all distributions are defined for M above 
the completeness threshold a, i.e., for M > a, being zero otherwise (as mentioned above, a 
is fixed to 5.3 • 10 1 ' N-m). For the power-law (PL) distribution (which yields the GR law for 
the distribution of M ) Eq. ([!]) reads 


(3 / a W +/ 3 


with (3 > 0. For the tapered Gutenberg-Richter, 


f tap {M-M) = 


[3 / a \ l +P 1 / a \P 
a \m) + 9 \m) 


D —(M—a)/e 


with [3 > 0 and 9 > 0 . And for the left-truncated (and extended to [3 > 0 ) gamma 
distribution; 
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with —oo < j3 < oo and 9 > 0 , and with T( 7 , z) = J°° x 1 ~ 1 e~ x dx the upper incomplete 
gamma function, defined for z > 0 when 7 < 0. We summarize the parameterization of the 
densities as f(M; 0), where 0 = {/3, 9} for the Tap and TrG distributions and 0 = /3 for 
the power law. Note that for the TrG distribution, it is clear that the exponent f3 is a shape 
parameter and 9 is a scale parameter; in fact, these parameters play the same role in the 
Tap distribution, which turns out to be a mixture of two truncated gamma distributions, 
one with shape parameter f3 and the other with [3 — 1, but with common scale parameter 9. 
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In contrast, the power law lacks a scale parameter. In all cases the completeness threshold 
a is a truncation parameter, but it is kept fixed and is not a free parameter, therefore. 

The knowledge of the probability densities allows the direct computation of the likelihood 
function as L(@) = n^i /(Af*; 0), where Mi are the N observational values of the seismic 
moment. Maximization of the likelihood function with respect the values of the parameters 
leads to the maximum-likelihood estimation 0 of these parameters, with L = L(0) the value 
of the likelihood at its maximum. We perform the MLE for the 3 models, obtaining, for the 
complete datasets, the values reported in Table [T| 


PL 

(3 6 (N-m) m c l (.M in N-m) 

MLE 0.685 00 00 -268466.609 

s.e. 0.009 

Tap 

MLE 0.684 3.3 10 22 8.94 -268465.315 

s.e. 0.00 9 2.6 10 22 0.23 

TrG 

MLE 0.681 6.7 10 22 9.15 -268464.844 

s.e. 0.00 9 6.6 10 22 0.27 


TABLE I: Maximum likelihood estimation of the parameters with their standard errors (s.e.) and 
maximum value of the log-likelihood function, l = In L when the PL, Tap, and TrG distributions 
are fitted to the seismic moment of shallow CMT earthquakes, using the whole data set (N = 6150). 
The standard error for (3 and 6 is computed from the Fisher information matrix and corresponds 
to one standard deviation of the distribution of each parameter. The standard error for m c is 
computed from that of 6 using the delta method [21]. 

A powerful method for comparison of pairs of models is the likelihood-ratio test, specially 
suitable when one model is nested within the other, which means that the first model is 
obtained as a special case of the second one. This is the case of the power-law distribution 
with respect to the other two distributions; indeed, the power law is nested both within the 
Tap and within the truncated gamma, as taking 6 —> oo in any of the two leads to the power- 
law distribution. This is easily seen taking into account that S tap (M) = (a/M) / 3 e~ < ' A/ ~ a )/ e , 
or just performing the limit in the expression for f tap (M) above. For the truncated gamma 
distribution, when doing the 6 —> oo limit in f trg (M) one needs to use that, for 7 < 0, 
z 1 / T( 7 , z) — y —7 when z —> 0, see Ref. [25] for 7 7 ^ —1, —2,... 
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Given two probability distributions, 1 and 2, with 1 nested within 2, the likelihood ratio 
test evaluates L 2 /Li, where L 2 is the likelihood (at maximum) of the “bigger” or “full” 
model (either Tap or TrG) and L v corresponds to the nested or null model (power law in 
our case). Taking logarithms we get the log-likelihood-ratio 

U = ln^ = l 2 -k, 

L\ 

with lj = In Lj = YliLi hi Oj), where f t denotes the probability density function of 

the distribution j for every j = 1, 2, and the MLE corresponds to @i = /3i and 0 2 = {/3 2 , 0 2 }■ 
In order to compare the £t provided by the two distributions, it is necessary to characterize 
the distribution of 1Z. 

Let rii and n 2 be the number of free parameters in the models 1 and 2, respectively. In 
general, if the models are nested, and under the null hypothesis that the data comes from 
the simpler model, the probability distribution of the statistic 27 Z in the limit N — > 00 is a 
chi-squared distribution with degrees of freedom equal to n 2 — 7i 1 > 0. So, 

27^ > 3.84 (2) 

with a level of risk equal to 0.05. Note that the chi-squared distribution provides a penalty 
for “model complexity” as the “width” of the distribution is given directly by the number 
of the degrees of freedom. This likelihood ratio test constitutes the best option to choose 
among models 1 and 2, in the sense that it has a convergence to its asymptotic distribution 
faster than any other test [26] . The null and alternative hypotheses correspond to accept 
model 1 or 2, respectively, although the acceptance of model 1 does not imply the rejection 
of 2, it is simply that the “full” model 2 does not bring any significant improvement with 
respect the simpler model 1. 

However, when the nesting of distribution 1 within 2 takes place in such a way that the 
space of parameters of the former one lies within a boundary of the space of parameters of 
distribution 2, the approach just explained for the asymptotic distribution of 21Z is not valid 
[23 HU- This is the case when testing both the Tap or the TrG distributions in front of the 
power-law distribution, as the 9 —> 00 limit of the latter corresponds to the boundary of the 
parameter space of the two other distributions. But the asymptotic theory of Refs. [271128] 
is also invalid, as the power-law distribution lacks the necessary regularity conditions, due 
to the divergence of their moments. This illustrates part of the difficulties of performing 
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proper model selection when fractal-like distributions are involved |29j. In order to obtain 
the distribution of 277 and from there the p —values of the LR tests, we are left to the 
simulation of the null hypothesis. We advance that the results seem to indicate that the 
distribution of 277, for high percentiles, is chi-square with one degree of freedom, but we 
lack a theoretical support for this fact. 

Let us proceed, using this method, by comparing the performance of the power-law and 
Tap fits when applied to the global shallow seismic activity, for time windows starting always 
in 1977 and ending in the successive times indexed by the abscissa in Fig. [IJa) (as in Ref. 
[21]). The log-likelihood-ratio of these fits (times 2), is shown in the figure together with the 
critical region of the test. In agreement with Bell et al. m we find that: (i) the power-law 
fit can be safely rejected in front of the Tap distribution for any time window ending between 
1980 and before 2004; and (ii) the results change drastically after the occurrence of the great 
2004 Sumatra earthquake, for which the power law cannot be rejected at the 0.05 level. So, 
for parsimony reasons, the power law becomes preferable in front of the Tap distribution 
for time windows ending later than 2004. The fact that the Tap distribution cannot be 
distinguished from the power law is also in agreement with previous results showing that 
the contour lines in the likelihood maps of the Tap distribution are highly non-symmetric 
and may be unbounded for smaller levels of risk [TJ 22j [30] . 

When we compare the power-law fit with the truncated gamma, using the same test, for 
the same data, the results are more significant, see Fig. [l](b) . The situation previous to 
2004 is the same, with an extremely poor performance of the power law; but after 2004, 
despite a big jump again in the value of the likelihood ratio, the power law continues as 
being non-acceptable, at the 0.05 level. It is only after the great Tohoku earthquake of 
2011 that the p —value of the test enters into the acceptance region, but keeping values not 
far from the 0.05 limit. From here we conclude that, in order to find an alternative to the 
power-law distribution, the truncated gamma distribution is a better option than the Tap 
distribution, as it is more clearly distinguishable from the power law (for this particular 
data). Nevertheless, a comparison between these two distributions (Tap and TrG) seems 
pertinent. 

When the models are not nested, as it happens if we want a direct comparison between 
the Tap and the TrG distributions, the procedure we use is the likelihood ratio test of Vuong 
for non-nested models [HI., 32]. In this case the critical values depend on the sample size, N, 
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FIG. 1: Results of likelihood ratio tests for nested and non-nested models. The points denote the 
value of the statistic 277 or 77 (depending on the test) and the dashed lines show the critical value 
of the corresponding test (at level 0.05). For the nested case, boxplots show the distribution of 2 1Z 
for 10000 simulations of the power-law null hypothesis, from which the critical value is computed. 
The abscissa corresponds to the ending point of a time window starting always in 1 Jan 1977. Note 
that the year is considered a continuous variable (not a categorical variable), so, the time window 
ending on 31 Dec 2004 takes value 2004.99 • • • ~ 2005. (a) Tap distribution versus power law. (b) 
Truncated gamma versus power law. (c) Truncated gamma versus Tap (non-nested case). 


turning out to be that, when N is large, 77 is normally distributed with standard deviation 
s\fN , where s denotes the standard deviation of the set 

{hi ftrg (, fitrgi @trg) hi ftap (T7;, f3tap 5 @tap) } 

for i — 1,..., N and 77 = Itrg — hap- Then, we accept that there exists a significant difference 
between the models if 

|77| > 1.96 sVN (3) 

at a level of risk equal to 0.05, with the model with larger log-likelihood being the preferred 
one. The critical value of the test arises because the null hypothesis is that the mean value 
of 1Z is zero (i.e., both models are equally close to the true distribution). Note that the 
alternative hypothesis corresponds to accept that the difference between the fit provided 
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by the models is significant. As the number of parameters is the same for the Tap and 
TrG models, their log-likclihood-ratio coincides with the difference in BIC or AIC, but, as 
mentioned above, the LR test incorporates a statistical test which specifies the distribution 
of the statistic under consideration. 

Figure [l](c) shows the evolution of the log-likelihood-ratio between the two models, for 
different time windows (starting always in 1977), together with the critical region of the test 
given by the Eq. ([3]). One can see how the fits provided by the Tap and TrG distributions 
do not exhibit significant difference, although the TrG provides, in general, slightly higher 
likelihoods. After the mega-event in 2004 the performance of the TrG fit improves, ap¬ 
proaching the limit of significance. This reinforces our conclusion that the TrG distribution 
is preferred in front of the Tap and power-law distributions. 

In order to gain further insight, we simulate random samples following the truncated 
gamma distribution, with the parameters f3 trg and 9 trg obtained from ML estimation of the 
complete dataset (Table [T]), with the same truncation parameter a and number of points (N = 
6150) also. To avoid that the conclusions depend on the time correlation of magnitudes, we 
reshuffle the simulated data in such a way that the occurrence of the order statistics is the 
same as for the empirical data; in other words, the largest simulated event is assigned to 
take place at the time of the 2011 Tohoku earthquake (the largest of the CMT catalog [2Tj). 
the second largest at the time of the 2004 Sumatra event, and so on. In this way, we model 
earthquake seismic moments as arising from a gamma distribution with fixed parameters 
and with occurrence times given by the empirical times and with the same seismic moment 
correlations as the empirical data, approximately. 

We simulate 1000 datasets with N = 6150 each. The results, displayed in Fig. [2j show 
that the behavior of the empirical data is not atypical in comparison with this gamma 
modeling. In nearly all time windows the empirical data lies in between the first and third 
quartile of the simulated data, although before 2004 the empirical values are close to the 
third quartile whereas after 2004 they lay just below the median. This leads us to compute 
the statistics of the jump in the log-likelihood-ratio between 2004 and 2005. The estimated 
probability of having a jump larger than the empirical value is around 4.5 %, which is 
not far from what one could accept from the gamma modeling explained above. Thus, a 
TrG distribution, with fixed parameters, is able to reproduce the empirical findings, if the 
peculiar time ordering of magnitude of the real events is taken into account. Notice also 
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FIG. 2: Comparison of the empirical log-likelihood-ratios between the TrG and power law with 
those of 1000 simulations of the TrG distribution, using the final parameters of Table [I] (i.e., 
/3 = 0.681 and m c = 9.15). Simulated seismic moments are reshuffled as explained in the text to 
make the comparison possible. Simulation results are displayed using boxplots, representing the 
three quartiles of the distribution of 21Z. 
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FIG. 3: Comparison of the values of the estimated parameters of the TrG distribution, f3 trg and 
fh c trg j for the empirical data and for 1000 simulations of the TrG distribution, using the final 
parameters of Table |T} Simulated seismic moments are reshuffled as explained in the text. The 
different stability of both parameters is apparent. 


that, although the simulated data come from a TrG distribution, they are not distinguishable 
from a power law for about half of the simulations of the last time windows, as the critical 
region is close to the median indicated by the boxplots. 

We can also compare the evolution of the estimated parameters for the empiral dataset 
and for the reshuffled TrG simulations, with a good agreement again, see Fig. [3] There, it 
is clear that although the exponent (3 reaches very stable values relatively soon, the scale 
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FIG. 4: As Fig. [2j but simulating a power law with parameter (3 = 0.685 (Table [T| instead of a 
TrG distribution. The reshuffling is also as in Fig. [2j as explained in the text. The simulations 
cannot explain the large empirical values of the log-likelihood-ratio. 


parameter 6 (equivalent to m c ) is largely unstable, and the occurrence of the biggest events 
makes its value increase. 

As a complementary control we invert the situation, simulating 1000 syntetic power-law 
datasets with /3 = 0.685 (Table [T]) , a = 5.3 ■ 10 1 ' N-m, and N = 6150, for which the same 
time reshuffling is perfomed, in such a way that the order of the order statistics is the same. 
In this case, the results of the simulations lead to much smaller values of the log-ratio, for 
which the power-law distribution cannot be rejected (as expected), in contrast with the 
empirical data, see Fig. |4} 

In summary, the truncated gamma distribution represents the best alternative to model 
global shallow earthquake seismic moments, in comparison with the tapered GR distribu¬ 
tion and the power law. The preponderance of the gamma model is maintained after the 
occurrence of the mega-earthquakes taking place from 2004 and it is only after the 2011 To- 
hoku earthquake that it is difficult to decide between power law and TrG. We have verified 
that these results are qualitatively similar if we restrict our study to subduction zones, as 
defined by the Flinn-Engdahl’s regionalization [B], with the main difference that the values 
of Itrg ~ Ipi become somewhat smaller and therefore the power-law hypothesis cannot be 
rejected after the Tohoku earthquake. 

In order to reproduce the time evolution of the statistical results it suffices that inde¬ 
pendent gamma seismic moments, with fixed parameters, are reshuffled so that the peculiar 
empirical time correlations of magnitudes are maintained. So, although the scale parameter 
6 is not stabilized, and the occurrence or not of more mega-earthquakes could significantly 
change its value C2. the current value is enough to explain the available data. It would be 
very interesting to investigate if the high values of the likelihood ratio attained before the 
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2004 Sumatra event could be employed to detect the end of periods of low global seismic 
activity. Certainly, more data would be necessary for that purpose [T2j. 

As an extra argument in favor of the truncated gamma distribution in front of the tapered 
GR, we can bring not a statistical evidence but physical plausibility; indeed, the former 
distribution can be justified as coming from a branching process that is slightly below its 
critical point [33j 44]. Further reasons that may support the truncated gamma are that this 
arises (i) as the maximum entropy outcome under the constrains of fixed (arithmetic) mean 
and fixed geometric mean of the seismic moment [35]; (ii) as the closest to the power law, in 
terms of the Kullback-Leibler “distance”, when the mean seismic moment is fixed [35]; and 
(iii) as a stable distribution under a fragmentation process with a power-law transition rate 

ffl. 
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